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Abstract 

Two separate algorithms for calculating the inter- 
mediate states, using cellular automata and the ini- 
tial conditions in the rate matrix for the diffusion- 
collision model are introduced. They enable easy and 
fast calculations of the folding probabilities of the in- 
termediate states, even for a very large number of 
microdomains. 

Introduction 

In recent years, many theoretical and experimental 
studies have focused on the problem of describing the 
mechanism of protein folding. The goal is to develop 
a model that predicts protein folding rates and their 
dependence on factors such as temperature, amino 
acid sequences and so on. There are two aspects 
to the prediction problem: one is predicting the na- 
tive structure of a protein from its sequence, which 
is thermodynamic in character; the other concerns 
the mechanism by which denatured proteins fold to 



their native conformation, and is dynamic in charac- 
ter. The dynamical aspects of folding are often for- 
mulated taking into account Levinthal's paradox jl) 
that a random search of all possible structures will 
result in a time longer than the age of the universe. 

A model that gives satisfactory predictions regard- 
ing the dynamical aspects of the protein folding is 
the diffusion-collision model of Karplus and Weaver 
[Q] (see Burton, Myers and Oas [Q] for a recent 
experimental test of the model). This model con- 
siders the protein to be made of secondary structure 
elements - microdomains, each short enough, for a 
rapid conformational search so that Levinthal's para- 
dox is avoided. Microdomain - microdomain folding 
is considered to occur as diffusion through solution, 
with some collisions between microdomains leading 
to smaller and then larger structures, until the na- 
tive conformation is reached. The randomness of the 
diffusion process indicates a very important charac- 
teristic of the model: that the folding process may 
involve many possible paths, not just one pathway, 
leading to the folded state. Furthermore, the model 
can calculate the probabilities of kinetic intermediate 
states at any moment in time. 



DESCRIPTION OF THE 
DIFFUSION-COLLISION MODEL 

- The diffusion-collision model has the following 
properties: 

- Presence of microdomains; 

- Transient secondary structure is formed before 
tertiary structure; 
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- Transient accumulation of kinetic intermediates; 

- Existence of folding pathways; 

- Possible existence of non-native intermediates 
from non-native collisions; 

- Solvent viscosity dependence of folding rates; 

- Folding rates and favored pathways dependent on 
properties of microdomains; 

- The microdomains move diffusively under the in- 
fluence of internal and random external forces, and 
microdomain-microdomain collisions occur. The dy- 
namics of folding is simulated by a set of diffu- 
sion equations that describe the motion of the mi- 
crodomains in aqueous solution, and by boundary 
conditions that provide for the microdomains colli- 
sion and possible coalescence. The diffusion-collision 
dynamics is represented as a network of steps, each 
containing a microdomain pair interaction, in which 
the rate of coalescence depends on the physical prop- 
erties of the microdomains. The rates can be analyt- 
ically expressed in terms of the physical parameters 
of the system. 

Let us consider the following analytical model 
to calculate the folding rate of two connected mi- 
crodomains, which is the elementary step in the 
diffusion-collision model. Consider two connected mi- 
crodomains A and B that coalesce into AB. 



A + B->AB (1) 

The dynamical behavior of the microdomains is 
modeled by a diffusion equation. Since we have a 
system of two microdomains, the equations are cou- 
pled ||. The relative motion diffusion equation is 



i(:;)=M::Mw 2 )(::) < 2 » 

where p is a 2 element vector, p 1 being a proba- 
bility density for both microdomains folded, and p 2 
the probability density for all other possibilities. D 
is the relative diffusion constant, and the rate con- 
stants are Ai - from both folded state to all others, 



and A2 the rate for the reverse process. Equation 2 
couples microdomain-microdomain relative diffusion 
with the two-state folding-unfolding process carried 
out in solution by the microdomains. The connect- 
ing chain between them limits the diffusion space for 
microdomain - microdomain relative motion. An ide- 
alization is made that microdomains are spheres con- 
nected by a polypeptide chain considered to be a flex- 
ible featureless string. The collision and coalescence 
of the microdomains are governed by the boundary 
conditions for Equation 2. The inner boundary is the 
closest approach spherical shell, in terms of van der 
Waals envelopes of the microdomains. The closest 
approach distance of two microdomains is the sum of 
their radii R m in. The other constraint on the diffusion 
space is the maximal radial separation R ma x between 
the microdomains, determined by the length of the 
string between A and B. So we have: 



-Rmin — Ra + Rb (3) 



-Rmax = Ra + Rb + shortest intervening chain lenght 

The boundary conditions on the probability den- 
sity are specified as: 

— 7{ ftmax — U (4J 

or 

which means that the microdomains cannot get 
further away from one another than i? rnaa; and the 
condition: 

^Umm = (5) 

meaning that the unfolded microdomains can not 
get closer to one another than R m in. Finally: 

Pl|fimin=0 (6) 

indicating that the both states folded probability 
density rl is zero at the inner boundary, meaning that 
coalescence takes place. 

The forward (folding) rate of coalescence of two 
microdomains to form a bond is taken to be fc/ = 
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1/t/ during intermolecular diffusion, where r/ is the 
folding time, and has the following general form ||: 



D 



£W(1 -/?) 
(3DA 



(7) 



Following Ref. 3, D is the relative diffusion 
coefficient, W is the volume available for diffusion of 
each microdomain pair,v4 is their relative target sur- 
face area for collision,/? the probability that the two 
microdomains are in a folded state, when they collide, 
so there is no barrier to coalescence. L and I are geo- 
metrical parameters that satisfy the boundary condi- 
tions for the diffusion equation in three dimensions. L 
has units of length and the value: 



f 



1 



a-Rmax tanh [a (R n 



-Rmin)] - 1 



Rm\r 



aR max tanh [a (i? max - #min)] 



(8) 

where a = ((Ai + A 2 ) /D) 1/2 

For the backward, unfolding rate kb = 1/rt,, the 
unfolding time Tb, has the following form [f§: 



(9) 



The actual contact surface area Aab between mi- 
crodomains A and B, is used as well as /, the free en- 
ergy change per unit area between the microdomains 
involved in the bond. The dissociation rate in the 
absence of an energy barrier is given by V, kb is the 
Boltzmann constant, and T is the absolute tempera- 
ture. 

Defining the folding and the unfolding rates in 
this way, for each two- microdomain process, multi- 
microdomain protein folding can be treated as a set 
of two-microdomain interactions, between all the pos- 
sible pair combinations of interactions. The process 
is continuous in time and the second order diffusion 
partial differential equation, reduces || to a system 
of linear first order (in time) equations represented 
by the following vector equation with elements: 



dpi 
dt 



(10) 



Here pj is the probability of a kinetic intermedi- 
ate state, and would correspond to the concentra- 
tion of a substance if the diffusion equation described 



the time varying difference in concentration between 
several adjacent, spatially discrete regions. The ele- 
ments UjjOf the rate matrix R are determined from 
the folding and unfolding rates, kf and kb. If i?can 
be diagonalized 

R = SAS- 1 (11) 

then the vector equation can be solved, and by 
standard linear algebra procedures finding first the 
eigenvectors and eigenvalue matrix A of the rate ma- 
trix R, the probabilities pi can be obtained as expo- 
nential functions of time: 



p(t) =p(0)Se At S 



(12) 



EXAMPLE OF DIFFUSION-COLLISION MODEL 
CALCULATIONS 

Let us consider a simple protein chain, made of 
three microdomains as shown in Figure 1, with the 
microdomain properties given in Table I. 



Figure 1. 
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A representation of a 3-microdomain unfolded protein chain. 
The possible pairings for this simple protein chain are AB, AC and BC. 



The actual pairings that are possible for this three 
microdomain protein are: AB, AC and BC. Ac- 
cording to the diffusion-collision model, we have the 
schematic description of the possible states shown in 
Table II. 

Here we have 8 possible states, the initial unfolded 
state #1, all 6 possible kinetic intermediate states, 
and the final folded state #8. The states are associ- 
ated with the coalescence of the corresponding pair- 
ings that are indicated in Table II by the digit 1. A 
schematic view is shown in Figure 2. Based on the 
initial data, Table III of transitions, states, bonds and 
parameters can be obtained. For larger n-pairings be- 
tween the microdomains, obtaining the data in Table 
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Ill is the most difficult part of the diffusion-collision 
calculation. The data from Table III goes into the 
calculations of folding and unfolding rates, kf and 
kb. Finally, the rate equation (Eq. 10) is solved to 
get pi , the probabilities of the states as a function of 
time. 

Several things can be generalized and noted for n 
pairings between the microdomains in a folding pro- 
tein. First, it can be noted that the number of states 
is 2™, since we consider two microdomain interaction 
at a time, the number of ways in which a population 
of n elements can be divided into two sub populations 
is 2™. So we have: 

- 2™ number of states (8 for n = 3) 

- n\ number of independent pathways (6 for n = 3) 

- n2 n ~ 1 number of transitions (12 for n — 3) 

For a larger number of microdomains and pairings 
between the microdomains, calculating the possible 
states, independent pathways, and all of the transi- 
tions can be a very tedious task. As can be seen 
from Table IV, the number of states, independent 
pathways, and the number of transitions, increases 
quickly with increasing n, the number of pairings be- 
tween the microdomains of a protein. As n increases, 
the steepest increase is in the number of independent 
pathways, which grows as a factorial. It is interest- 
ing to note that this rapid increase in the number of 
possible native pathways may lend evolutionary sta- 
bility to protein native structures, since they can be 
reached via multiple pathways, and blocking of one 
or more routes will probably not affect the folded 
state. From the point of view of actual calculations, 
these large numbers create a limiting problem, find- 
ing the actual states and pathways, and more specif- 
ically calculating R ma x, which is a practical problem 
in a situation when there are a large number of path- 
ways, corresponding to different configurations of mi- 
crodomain interactions. 

The two algorithms described below are used to 
speed up and simplify the diffusion-collision calcula- 
tions for a large number of microdomains. 

ALGORITHM FOR OBTAINING 
DIFFUSION-COLLISION MODEL STATES 

As mentioned before, for a given number of pair- 
ings n, the number of states is 2™. For actual cal- 



culations we need to distinguish all of those states 
and indicate to which coalesced pairings they corre- 
spond. As can be seen from Table II, we identify these 
states with numbers, ranging from 1 to 2n. The cor- 
respondence with the coalesced pairings is obtained 
by numbering the states with binary numbers, where 
the digit 1 at the appropriate position underneath the 
pairing, indicates that coalescence of that pairing oc- 
curs (see Table II). For example, state 1, the unfolded 
state, does not have a digit 1 in its binary representa- 
tion. State 2 indicates that the pairing AB coalesced, 
so correspondingly there is a digit 1 underneath the 
pairing AB. The same numbering is applied to all the 
states. The binary notation keeps the information of 
which pairings coalesce, and they are related to the 
decimal numbering of the states by turning the binary 
number into decimal plus 1. This is a very useful and 
condensed way of numbering all the states, except 
that for large number of pairings n, it is not easy to 
write down and number all the states. The following 
algorithm does that. 

It can be noted that 2™ states, grouped by the num- 
ber of pairs in a state, are actually the binomial co- 
efficients 



a 2 n 

(13) 

where x = 1, so for one pair states we have 
(") different states, for two pair states we have 
('^different states, and so on. A schematic way of 
representing the binomial coefficients is the familiar 
Pascals triangle where the binomial coefficients in the 
next row are simply related by addition, to the val- 
ues in the previous row. This motivates using cellular 
automata to create and represent the pair states (bi- 
nomial coefficients). 

Cellular Automata 

Cellular automata are discrete space-time dynam- 
ical systems, where each cell has a set of possible 
values, belonging to a finite field. They reproduce on 
a space-time grid, evolving synchronously according 
to a specific mathematical rule usually in correlation 
with the number of cell neighbors on the grid, the 
most famous example being the Game of LifeH]. 
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The simplest example of a cellular automaton 
would be the one-dimensional case, where the grid 
is actually made of segments on an infinite line, and 
the dependent variable Stakes values and 1. Here t 
and n are considered as time and space variables, re- 
spectively. The initial data is a set of zeros and ones, 
and the time evolution is given by the rule function 
fr(qli) that enables us to construct the next step. So 
for any generation we have: 

C 1 = fM) (14) 

Here the subscript r indicates the neighborhood of 
the rule function, or on how many spatial neighbors 
(2r + 1 in this case) the cell at position k depends: 

For example, by taking the simple case of r = 1 
(corresponding to nearest neighbor correlation) and 
a rule function defined as: 

ql +1 = (q t n . 1 +q t k+1 ) = q(mod2) (16) 

where q is either 1 or and = stands for 
Congruence (integral divisibility in the sense = 
0(mod2);l = l(mod2);2 = 0(mod2) ), we get the 
pattern in Table V. As we can see, the rule of mul- 
tiplication is that the cell multiplies in the next mo- 
ment of time (generation) at position n-1 and n+l. 
In addition, there is the rule of annihilation (over- 
crowding) of cells that happens when a cell has two 
neighbors (ones) . The middle cell becomes zero from 
overcrowding since two new cells are born on the same 
place. These rules apply to all generations (time 
steps). This is just one example of a rule function 
whose pattern is a picture of Serpinskys triangle, that 
is, a fractal formed by deleting the inside triangle of 
a larger equilateral triangle. Serpinskys triangle can 
be also generated from Pascals (binomial coefficients) 
triangle by deleting the even numbers from it. This 
is the motivation for using a kind of a cellular au- 
tomaton with a similar rule function as a way of gen- 
erating the pair states (binomial coefficients) for the 
diffusion-collision model. The missing feature in the 
rule for succeeding generations is the binomial repre- 
sentation of the states that actually keeps the infor- 
mation about the transitions, independent pathways 



and the pairings involved. To solve this problem, an 
additional feature is added to the cellular automata, 
namely putting a binary genetic code in the cells of 
the one-dimensional cellular automata. By adding 
a rule to the evolution of the cellular automata, in- 
volving the transfer of the binary genetic code to the 
next cell, we will be able to keep track of kinetic in- 
termediates and get all the information necessary for 
our purpose of describing the folding kinetics of the 
states for the diffusion-collision model. 

Cellular Kinetics 

Different folding states are obtained by evolving 
one-dimensional cellular automata with a binary ge- 
netic code contained in each of the cells. At every new 
generation the binary genetic code is mixed with the 
parents code from the previous generation. The rule 
of mixing is the following: in the next generation, a 
from the right is attached to the binary genetic code 
of the cell on the n — 1 position, and a 1 from the 
right is attached to the binary genetic code of the 
cell on the n+l position. For example, starting from 
generation t: 

Generation (t) : 
cell [ 011010 ] 
Generation (t+1) : 

cell [ 011010 <-0 ] cell [ 011010 <-l ] 

The binary genetic code for the first few genera- 
tions is: 

Gen. (t=0) [0] 

Gen. (t=l) [0<-0] [0<-l] 

After the first generation, to the initial binary ge- 
netic code [0] at t = and n-th position, we have one 
zero added from the right at t = 1; (n — l)-th posi- 
tion obtaining [00] and one 1 added from the right at 
t=l; (n + l)-th position obtaining [01]. 

Gen. (t=2) [00<-0] [00<-l] [0K-1] 
Gen. (t=2) [0K-0] 

From the second generation each cell multiplies to 
the left and to the right adding a or 1 respectively 
to the end of its own binary genetic code, forming the 
following cells: 
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Gen. (t=3) [000<-0] [000<-l] [OOK-1] [01K- 
Gen. (t=3) [OOK-O] [010<-1] 

Gen. (t=3) [010<-0] [OIK-O] 

We can see that by applying this rule we get exactly 
the states we need for our purpose of describing the 
diffusion-collision folding kinetics. Continuing this 
simple rule will list all the states in binary /decimal 
form for fast and easy generation of a large number of 
pairings. Enumerating all the possible states in this 
manner is very convenient for keeping track of the 
independent pathways and transitions for diffusion- 
collision folding kinetics. 

ALGORITHM FOR OBTAINING R ma x 

A second important problem in calculations of 
diffusion-collision folding kinetics for a large number 
of pairings n, is finding the upper bound of the diffu- 
sion space -Rmax- As mentioned before, R m ax needs 
to be found for every transition and is then used to 
calculate the folding rate kf for that transition. The 
problem becomes complicated even for a small num- 
ber of pairings, like n = 5 or 6. The different mi- 
crodomain conformations make it very hard to find 
what is the maximal distance between the pairings 
that are supposed to coalesce. Below is a schematic 
illustration of the problem and the algorithm that is 
used to solve it. 

In the diffusion-collision model, the microdomain 
structure of a protein is assumed to be like a num- 
ber of beads on a string, all having different radii 
and distances between one another. In its unfolded 
state the protein will look, for example, like the string 
of 11 beads (microdomains) shown in Figure 3. To 
find the maximum distance between coalescing mi- 
crodomains, for example beads 2 and 7, we need only 
to sum up the distances 2-3, 3-4, 4-5, 5-6, 6-7, the 
diameters of beads 3,4,5 and 6 and the radii of beads 
2 and 7. This will be the maximum size of the (spher- 
ical) diffusion space. The minimum size is the radial 
distance between their centers, when bead 2 and 7 
are in contact. The situation quickly becomes com- 
plicated when more beads (microdomains) start to 
coalesce. For example, if we have to find again the 
maximum distance between beads 2 and 7, but beads 



]3 and 9 have already coalesced, then we need to worry 
about additional paths between beads 2 and 7. We 
will then have several different ways of calculating 
the distances. For example, from 2 to 7 we can go 
through 2-3, 9-4-5-6-7, or 2-3, 9-8-7, as illustrated in 
Figure 4. In this case, we need to find the short- 
est of the possible paths between 2 and 7. Depend- 
ing on the distances and radii of the beads, any of 
the available ways for a given conformation can be 
the shortest. As we have more and more beads (mi- 
crodomains) coalescing, the situation soon gets very 
complicated. In order to analyze the different con- 
firmations, we need first to have a way of keeping 
track of them. That can be done by using matrices. 
The N-microdomain protein structure is kept in the 
adjacency matrix (AM). It is a N x N matrix where 
the nonzero elements indicate the adjacent neighbors. 
For our example of TV = 11 microdomains, the adja- 
cency matrix (AM) for the unfolded state is given in 
Table VI. Coalescence between microdomains 3 and 9 
changes the adjacency matrix (AM), so that 3 is now 
a neighbor to 10 and 8, and 9 is a neighbor to 2 and 
4, which is indicated in the new adjacency matrix in 
Table VII, as new nonzero elements. 

In order to find Rmax between two beads i and j, 
once we have a way of tracking the different confor- 
mations, we need to consider all the possible ways of 
getting from i to j, for a given conformation of the 
other beads. Multiplying the adjacency matrix AM 
with itself does that. The power n of AM n will cor- 
respond to the steps between the beads. The new 
nonzero elements in the product matrix contain all 
of the possible routes to get from the initial bead i 
to n. In order to get to j we need to repeat the pro- 
cedure at most M"" 1 ' times. This will cover all of 
the possible steps (distances) between the coalescing 
pair (i, j). For our purpose of finding R m ax, we just 
need to choose a specific pattern of nonzero elements. 
To get that pattern we need to form a matrix that 
contains the i-th column of the correct power of AM 
as shown in Table VIII. This is an example of the 
matrix that is used to determine the steps (distance) 
from the first microdomain i = 1. For example, for 
j = 8, we need just the first 7 rows of the above 
matrix. The valid steps are the nonzero elements of 
the matrix, and more specifically the various diago- 
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nal ones. The actual numbers in the above matrix 
dont mean anything. What matters is whether they 
are zero or not. 



The program that calculates R ma x , starts from the 
initial microdomain, and goes to the neighboring one 
in a certain direction (left and right diagonal, since 
we count the distance between neighbors) . There are 
two different directions that the steps (distance) can 
be taken, indicated by the yellow color (from left to 
right), and green (from right to left). In the Matlab 
program for this algorithm, this information is kept 
in a cell array of the position in the matrix and the 
direction (left/right). An additional feature is that 
when the step comes to a microdomains index that 
is a part of a pairing (as microdomains 3-9 in the 
previous example) the step splits into left and right. 
This is indicated by the light blue colored elements 
in the matrix above. Associating the radii of the mi- 
crodomains with the elements in the above matrix, 
and the transitions from row to row in a certain di- 
rection with the distance between the microdomains, 
gives an easy way to find the distance (R ma x) be- 
tween any of the microdomains. 



SUMMARY 

The diffusion-collision model was suggested in 1976 
as a model for the process of protein folding based 
on the dynamical interactions of microdomains, as a 
series of diffusion-collision steps. Since then the cal- 
culational power of computers has greatly increased, 
which enables faster calculations in the model. How- 
ever the combinatorial complexity for the calcula- 
tion of states, independent pathways, transitions, and 
Rmax is too lengthy, and susceptible to error, if car- 
ried out by hand for each protein to be studied. In- 
troducing the two described algorithms reduces all of 
these calculations to several minutes, even for a large 
number of pairings. The algorithms have been im- 
plemented as MATLAB programs. For a particular 
protein, the number of microdomains, the number of 
pairings and actual pairs, microdomain radii and dis- 
tances between microdomains are input parameters. 
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Table 1. 


Example of a 


three microdomain protein data.* 


Microdomain 


Radius (A) 


Distance (A) 


Access area (A2) 


A 


rA 




Aa 


B 


rB 




A B 


C 


rC 




Ac 


AB 




cAB 


Aab 


AC 






Aac 


BC 




cBC 


Abc 


ABC 






&ABC 



*Numerical values of these properties are the input parameters in diffusion-collision model calculations. 



Table II. Possible states for a three-microdomain protein.* 

Unfolded One-Pair Two-Pair Folded 



# 


BC 


AC 


AB 


# 


BC 


AC 


AB 


# 


BC 


AC 


AB 


# BC 


AC 


AB 










2 








1 


4 





1 


1 








1 











3 





1 





6 


1 





1 


8 1 


1 


1 










5 


1 








7 


1 


1 












*A schematic description of the possible kinetic intermediate states. Decimal numbers enumerate the states, while the binary digit 1 in the 
binary numbers represents a coalesced pair. The decimal number is obtained by adding one to the decimal value of the binary number. 
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Table III. Transition States, Bonds and Parameters of three-microdomain protein.* 



Transition 


I n i *t i 3 1 State 


Bond Formed 


Final State 


F? 


D 

' x max 


l->2 


A-b-L 


Ad 


A D t~ 

AB-L 


rA+rB 


rA+cAb+rb 


l->3 


ABC 


AC 


B-AC 


rA+rC 


rA+cAB+2rB+cBC+rC 


l->5 


A-B-C 


BC 


A-BC 


rB+rC 


rB+cBC+rC 


2->4 


AB-C 


AC 


ABCi 


rAB+rC 


rAB+cBC+rC 


2->6 


AB-C 


BC 


ABC 2 


rAB+rC 


rAB+cBC+rC 


3->4 


B-AC 


AB 


ABCi 


rB+rAC 


rB+cBC+rAC 


3->7 


B-AC 


BC 


ABC 3 


rB+rAC 


rB+cBC+rAC 


5->6 


A-BC 


AB 


ABC 2 


rA+rBC 


rA+cAB+rBC 


5->7 


A-BC 


AC 


ABC 3 


rA+rBC 


rA+cAB+rBC 


4->8 


ABG 


BC 


ABC 4 


rB+rC 


TrrABC 


6->8 


ABC 2 


AC 


ABC 4 


rA+rC 


TrrABC 


7->8 


ABC 3 


AB 


ABC 4 


rA+rB 


TrrABC 



'Based on the initial parameters, the data in this table is obtained from the diffusion-collision 
model calculations. The data is then used to calculate the probabilities of the kinetic states 
and the folding and unfolding rates by solving the diffusion equation. 



Table IV. 


Combinatorial dependence on the number of 


pairings. 


# of pairings 


# of states 


# of independent 


# of transitions 


n 


2" 


pathways n! 


n2 n - 1 


1 


2 


1 


1 


2 


4 


2 


4 


3 


8 


6 


12 


4 


16 


24 


32 


5 


32 


120 


80 


6 


64 


720 


192 


7 


128 


5040 


448 


8 


256 


40320 


1024 


9 


512 


362880 


2304 



*Considering large rnumber of pairings n, between the microdomains of the protein, the number 
of states, transitions, and independent pathways increases quickly, thus creating combinatorial 
complexity in the calculations. 
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Table V. Simple cellular automaton pattern 



000000010000000 
000000101000000 
000001000100000 
0J_0J_0J_0J_0 
000100000001000 
J_ J_ 0J_0J_0 
0J_0 0J_0 0J_0 1 
101010101010101 



Table VI. Adjacency matrix for N=ll with no pairings 



AM(1, 







1 





























AM (2, 




1 


~~0~ 


1 


























AM(3, 







1 





1 























AM(4, 










1 





1 




















AM(5, 













1 





1 

















AM(6, 
















1 





1 














AM(7, 



















1 





1 











AM(8, 






















1 





1 








AM(9, 

























1 





1 





AM(10,:) 


























1 " 





1 


AM(11,:) 





























1 






Table VII 


Adjacency matrix for N = 


11 with 


microdomains 3-9 


pairing 


AM(1, 







1 














AM(2, 




i 


1 





1 








AM(3, 







"10 1 





1 


1 





AM(4, 







1 


1 


1" 








AM(5, 







1" 


1 











AM(6, 










1 


1 








AM(7, 







0" 


1 


1 








AM(8, 







1 





1 1 








AM(9, 







1 1 





1 


1 





AM(10,:) 





1 





1 





1 


AM(11,:) 














1 " 
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Table VIII. Adjacency matrix pattern used to find R, 



Initial 1 



AM(1,: 
AM 2 (2, 
AM 3 (3, 
AM 4 (4, 
AM 5 (5, 







1 































1 





1 

















1 













3 





2 











2 





2 







3 





9 





2 





2 





9 





2 







21 





20 





4 





20 





20 





AM 6 (6, 
AM 7 (7, 




21 





81 





24 





24 





81 





20 







183 





186 





48 





186 





182 





AM 8 (8, 




183 





737 





234 





234 





737 





182 


AM 9 (9, 







1657 





1708 





468 





1708 





1656 





AM 10 (10,:) 


1657 





6729 





2176 





2176 





6729 





1656 


AM n (ll,:) 





15115 





15634 





4352 





15634 





15114 
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Figure 2. 
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Figure 3. 



(J>-^D (sy^y^) © ®- 

A representation of an 11 - microdomain unfolded protein chain. 
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Figure 4. 




A representation of an 11 - microdomain protein chain with one coalesced pairing between microdomains 3 and 9. 

The pairing has changed the adjacency matrix. 
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